Using Syllables As Indexing Terms in Full-Text Information Retrieval
نویسندگان
چکیده
This paper describes empirical results of information retrieval in 13 languages of the Cross Language Evaluation Forum (CLEF) collection augmented with results of Turkish using syllables as a means to manage morphological variation in the languages. This kind of approach has been used in speech retrieval (e.g. Larson and Eickeler 2003), but for some reason it has not been much tried out in text-based IR, although it has many clear advantages. Firstly, a quite well working version of it can be implemented with a very simple syllabification algorithm, consisting of only variants of one syllable structure rule, CV, consonant vowel. Secondly, although syllable-based word form variation management resembles n-gramming (McNamee and Mayfield 2004), it has the advantage, that the number of grams with syllables is more restricted which keeps the size of the text index smaller and retrieval faster. Thirdly, syllable-based approach makes possible to use different types of syllabification procedures, which can be either very fine grained, i.e. language specific or very coarse, i.e. more language independent. Fourthly, syllable based methods work for both speech and text retrieval. Our results show, that the two different CV syllabification procedures produced good results with four morphologically complex languages of the CLEF collection. For Turkish they produced also good results. For three of the languages that got good results with the CV syllabification (De, Fi and Tu), we tried also language specific, accurate syllabification procedures. Accurate syllabification was not able to produce as good IR results as CV procedures.
منابع مشابه
Journal of Emerging Trends in Computing and Information Sciences::Automatic Learning Context Tool for Effective Personal Document Indexing and Retrieval
Managing digital documents has become a time consuming process due to sheer scale. Most users manage their personal documents by creating logical hierarchical folder structures. This logical structure depends on the user’s assessment of the context of the document. Basic file structuring has not been changed for decades and hierarchical file structure remains the same. But there has been a surg...
متن کاملCo-Operative DSIR Text Indexing System
The unceasing development of the Internet technology currently revolutionizes the way we look for relevant information. Since the number of web pages is uncountable, and very disorganized, a powerful searching tool like Information Retrieval (IR) system is needed. In this paper, we propose a co-operative indexing system called “DSIR”. Co-operative DSIR is a full text vector space based indexing...
متن کاملA Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملExemplary documents: a foundation for information retrieval design
Documents are generally represented for retrieval by either extracting index terms from them or by creating and selecting from an external set of candidate terms. There are many procedures for doing this, but while work continues along these dimensions, there have been relatively few attempts to change this basic process. Of particular importance is the creation of indexing schemes for retrieva...
متن کاملNLP for Indexing and Retrieval of Captioned Photographs
We present a text-based approach for the automatic indexing and retrieval of digital photographs taken at crime scenes. Our research prototype, SOCIS, goes beyond keyword-based approaches and methods that extract syntactic relations from captions; it relies on advanced Natural Language Processing techniques in order to extract relational facts. These relational facts consist of a “pragmatic rel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010